Appendix A, Architecture

Appendix A

Architecture

How the Catapult Server Proxy Service Works

How the RWS Service Works

How the Proxy and RWS Services Work Together

Authentication and Security

How the Catapult Server Proxy Service Works

The Standard Proxy Protocol

The Catapult Server Proxy Service

The Standard Proxy Protocol

Most Internet applications, File Transfer Protocol (FTP), World Wide Web, and gopher included, use a client/server architecture. These applications use the conventions established in standard protocols for communication between client applications and server applications. HTTP, as well as the first code libraries to support WWW client and server applications, has its origins as a UNIX-based service in Switzerland’s Conseil Europeen pour la Recherche Nucleair (European Laboratory for Particle Physics, or CERN). As the CERN staff added support for application-aware proxy to their libraries, the WWW community built on these additions, and the CERN-proxy protocol became an accepted industry standard. Catapult Server Proxy Service is fully compatible with CERN-proxy protocol.

While CERN-compatible proxy services support WWW (HTTP), FTP, and gopher requests, all communication between a client and the proxy server uses HTTP. HTTP defines a set of commands (called methods) that a client can send to a server. The two most common methods are GET and POST. GET is used to forward a Uniform Resource Locator (URL) to a server requesting the resource to which the URL refers. POST is used to forward a request that contains a URL and data; typically, a user provides this data by completing a Hyper Text Markup Language (HTML) form.

When a browser that is not configured for proxy service sends an HTTP URL directly to a WWW server, it sends the server the GET method, which includes the path and resource-name requested. The browser removes the protocol and site name from the URL (http://domainname). For example, when you type the URL

http://host.com/sales/report.htm

on the command line of a browser that is not configured for a proxy, the browser sends this command to Host.com:

GET /Sales/Report.htm

When a browser is configured to use a proxy server, the browser uses HTTP and the GET method to send WWW, FTP, and gopher requests to the proxy server. In this case, the GET method includes the protocol name as well as the site name of the server. The following is an example of a WWW (HTTP) request for a document entitled Doc.htm in the Sales directory on the server Host.com:

The process is as follows:

The Web browser sends the proxy-formatted request
The Proxy server:
- Receives the request
- Parses the URL
  If the URL is in the cache, the request is serviced to the Web browser from the cache.
- Identifies the request as HTTP
- Resolves the domain name to an IP address
- Requests Doc.htm from the Web server by using the appropriate protocol.
  In this case, because the URL specified HTTP, HTTP is the appropriate protocol. (Note that a request sent to a Web server is formatted by the Proxy service as a standard (non-proxy) request.
The Web server:
- Receives the request
- Responds by sending Doc.htm to the proxy using HTTP
The Proxy server:
- Receives Doc.htm
- Sends Doc.htm to the browser using HTTP
The Web browser receives Doc.htm and displays it on-screen, completing the process.

Here is an example of an FTP request via a proxy for document Q296.doc. Note that the HTTP protocol is used by the browser to send a GET method to the proxy. The GET method contains an FTP URL.

In this case, the proxy service identifies the request as the FTP protocol, and requests Q296.doc from host.com using the FTP protocol. Host.com then returns Q296.doc to the proxy using the FTP protocol, and the proxy uses the HTTP protocol to send the document to the client.

Browsers that are not configured for the proxy service issue FTP and gopher requests using the FTP and gopher protocols, respectively. A browser that is not configured for the proxy service cannot issue an FTP or gopher requests by using HTTP.

Configuring a browser to communicate with a proxy server actually simplifies the work that the browser needs to do, because all requests are processed with HTTP and have complete URLs.

The Catapult Server Proxy Service

About the Catapult Server Proxy Service

Proxy ISAPI Filter

Proxy ISAPI Application

Caching Mechanisms

About the Catapult Server Proxy Service

Application-level proxy services have knowledge of the protocol used by the applications they support. This knowledge allows the proxy to offer additional features, such as user authentication, protocol conversions, and local caching of retrieved content. These features add security, improve response time and access control, and decrease network usage.

A World Wide Web proxy performs functions associated with both clients and servers. As a server, it receives WWW requests from private network clients; as a client, it responds to private network clients’ requests by issuing the appropriate requests to a WWW server on the Internet. The cusp between the client and server components of the proxy service provides opportunities to add increased security and functionality, making a proxy more secure and feature-rich, instead of just forwarding data packets at the transport layer.

Catapult Server Proxy service runs as an extension to Microsoft Internet Information Server (IIS) version 2.0. In order to install Catapult Server, you must have both Windows NT 4.0 and IIS 2.0 installed. (Typically, IIS is installed when Windows NT 4.0 is installed, or when you upgrade to Windows NT 4.0 from a previous version of Windows NT). Catapult Server Proxy service is implemented as a dynamic-link library (DLL) that uses the Internet Server Application Programming Interface (ISAPI), and therefore runs within the process of the IIS WWW service. (For more information about the ISAPI and about the IIS WWW service, see the Installation and Administration Guide For Microsoft Internet Information Server, an online book installed with that product.) The WWW service must be installed and running in order for proxy requests to be processed. Because all proxy requests for Web, gopher, or FTP resources are sent from the client to the proxy using the HTTP protocol, it is both convenient and efficient for the IIS WWW service to receive these requests and pass them on to the Catapult Server Proxy service DLL by means of the ISAPI interface.

Functionally, the Catapult Server Proxy service consists of two components, the Proxy ISAPI Filter and the Proxy ISAPI application.

Proxy ISAPI Filter

The ISAPI filter interface allows for registration of an extension that the Web server calls whenever it receives an HTTP request from a client. A ISAPI filter is called for every request, regardless of such details as the identity of the resource requested in the URL. Thus, an ISAPI filter can monitor, log, modify, redirect, or authenticate requests sent to the Web server.

The WWW service can call the ISAPI filter DLL’s entry point at various times during a request-and-response sequence. When the ISAPI filter is loaded, it programmatically registers the notification points in which it is interested. The WWW service then starts calling the ISAPI filter DLL’s entry point at each requested notification point for each HTTP request.

The Proxy ISAPI filter registers itself to use the SF_NOTIFY_PREPROC_HEADERS notification point, because when called at this notification point for a request, the Proxy Filter can see the URL sent by the client (and can modify the URL), before the Web server processes the request. The Proxy ISAPI filter examines each request to determine if the request is a proxy request or a standard HTTP request. For more information about the ISAPI filter interface, see the ActiveX Development Kit, available at http://www.microsoft.com/intdev/.

If the request is a proxy request (that is, if the request contains a URL complete with protocol and domain name as described in The Standard Proxy Protocol, earlier in this chapter) the Proxy ISAPI filter adds the name of the Proxy ISAPI application (W3proxy.dll) to the URL. This causes the WWW service to forward the request to the Proxy ISAPI application for processing.

If the request is not a proxy request-that is, if the request does not contain a protocol and a domain name-then the request is for a web resource on the proxy server. The Proxy ISAPI filter does not modify the request, and normal processing within the Web server continues, thus allowing Web publishing to work normally.

Proxy ISAPI Application

The ISAPI application interface uses an in-process mechanism to extend Web server functionality. Unlike Common Gateway Interface (CGI), another mechanism for extending Web server functionality, an ISAPI application does not initiate a new process for every request. ISAPI applications can create dynamic Hypertext Markup Language (HTML) and integrate the Web with other service applications such as databases.

An ISAPI application DLL loads once; thereafter, the Web server calls the DLL whenever it receives a client request for that application. The Proxy ISAPI application is contained within W3proxy.dll, which also contains the Proxy ISAPI filter. Because both application and filter reside in w3proxy.dll, all necessary initialization for both is done when the server is started.

Every time it receives a request, the Proxy ISAPI application does the following:

Authenticates the client.
Applies the domain filter.
Looks for objects in the cache (and returns them from the cache if found and current).
Gets the objects from the Internet, sends them to the client, and adds them to the cache if appropriate.

If a request is valid, and it is necessary to issue the request to an Internet site, the ISAPI application parses the URL to extract the protocol (HTTP, FTP, gopher), and the domain name. For HTTP requests, the ISAPI application calls the appropriate Windows Sockets APIs directly to process file requests.

For HTTP requests, all input/output (I/O) is done asynchronously after the domain name has been resolved. If possible, Domain Name System (DNS) is used to resolve the domain name. (DNS is used to resolve Internet or UNIX system names, and Microsoft TCP/IP includes DNS support.)

Issuing the request to the Internet site includes the following steps:

Resolve the domain name to an IP address (use the DNS cache if possible).
Connect to the remote site.
Send the request to the remote site.
Receive the response header from remote site.
Receive the data (send to the client and save in the cache).

The ISAPI Proxy application uses a small set of reusable worker threads and asynchronous I/O to achieve very high performance. An Asynchronous Thread Queue (ATQ) and the TransmitFile API further enhance thread efficiency. The Proxy ISAPI application benefits from the high performance and scalability built into IIS, as well as its own architecture.

By running as an ISAPI application, the proxy server benefits from many other functional and performance features of IIS. An example is the Web server’s support for HTTP Keep-Alives. Keep-Alives is the feature, when supported by both browser and server, that allows TCP connections to remain intact after a request/response is completed. This significantly improves performance if another request is made from the same client to the same server within a time limit for connections. Support for Keep-Alives requires that the Web server be able to return the byte size of responses to clients, and that time-outs are used by client and server so as to efficiently manage Windows Sockets connections.

Because HTML pages often have several image links to graphics files on the same server as the HTML file, for non-proxy environments, Keep-Alives often improve performance, even when another HTML page is not requested.

For a proxy server, the Keep-Alives mechanism is much more valuable. In a typical scenario, a company has a small number of computers running Catapult Server for Internet access. Every attempt within the company to access Internet Web, gopher, and FTP sites requires a connection from a browser on the internal network to one of these proxy server computers. The probability of reusing a connection between the same client and the same proxy server, is very high. Internet Explorer version 3.0 will support Keep-Alives to a proxy server.

Caching Mechanisms

About Caching

The Catapult Server Proxy service uses caching to maintain a local copy of Web objects. This allows subsequent requests for these objects to be serviced from a local disk copy rather than issuing the request over the Internet, thereby improving user-perceived performance and reducing bandwidth consumption on the site's Internet connection.

Not all objects that pass through Catapult Server Proxy service can or should be cached. Some objects are dynamic and will change frequently, some change every time they are accessed. Other objects require authentication of the requesting client and cannot be cached for security reasons.

A Web object must satisfy the following criteria in order to be cached:

The request must be a GET.
There may be no ?keywords in the request URL.
The file must be served by the FTP, HTTP, or gopher protocols (objects associated with other protocols are not cached).
The HTTP header must not include “WWW-Authenticate,” “Pragma: no-cache,” “Cache-control: Private,” “Cache-control: no-cache,” or “Set-Cookie.”
The date in the Expires HTTP header field must be later than the one in the date header. (The date header is one of the fields that is generically returned in almost all HTTP requests. It indicates the date and time that the Web server received the request. Some Web servers indicate to downstream caches that a page shouldn't be cached by setting the Expires: header equal to the Date: header, indicating that the page expires immediately.)
Also, “Expires: 0” will prevent caching.
The HTTP Result code must either be 200 (success), 403 (Forbidden request), or 404 (URL not found)
The object must not be encrypted or protected by Secure Sockets Layer (SSL).

The HTTP header may include “Cookies” which allow a server to customize a response for a particular user. Cookies are increasingly used for custom pages or for informal (that is, not very secure) authentication. Catapult Server will treat cookies as another optional HTTP header that will be disregarded, with the exception of the “Set-Cookie” header. It will be assumed that subsequent transactions after the cookie has been set can be cached, unless any of the above headers are included to indicate an object's non-cacheability.

Catapult Server caches data by means of several mechanisms:

Passive
Active
Negative

The remainder of this section discusses these mechanisms

Passive Caching

Passive caching, also referred to as “on-demand” caching, is the basic mode of caching. The following figure depicts passive caching.

Catapult Server interposes itself between the client and local or remote Web and intercepts requests (for example HTTP GET requests) from the client. Before forwarding the request on to the Web, Catapult Server first calls into its cache (Urlcache.dll) to determine if the cache can satisfy the request (using the RetrieveUrlFile API). If the data is in the cache and has not expired, it is returned immediately to the client using the Windows Sockets TransmitFile API. If an object has a last modified date, the “If-Modified-Since” option on the HTTP GET command can be used to download the latest copy of the page from the Web.

If the data is not cached or if the copy in the cache has expired, Catapult Server retrieves the data from the Web, returns it to the user, and inserts it into the cache (using the CreateUrlFile and CacheUrlFile APIs). If the local disk space reserved for the cache is too full to hold the new data, older objects are removed from the cache using a formula that factors in age, popularity and size.

The Catapult Server cache APIs are documented under “Persistent Cache API” in the ActiveX Development Kit, available at http://www.microsoft.com/intdev/.

Active Caching

Catapult Server uses active caching to improve the client-perceived performance by increasing the likelihood that a requested object will be found in the cache. Active caching works as a superset to passive caching.

Typically, in passive caching, an object is placed in the cache and a Time-To-Live (TTL) is associated with that object. During this TTL, all requests for the object are serviced from the cache without generating traffic back to the upstream web server. After the TTL has expired, subsequent client requests for the object will generate traffic to and from the web server. The response from the server will be stored in the cache and a new TTL will be calculated.

Active caching augments this system by having the server automatically generate requests for a subset of objects. Catapult Server optimizes the choice of objects for active caching on the basis of the object’s:

Popularity. Ensures that requests made by Catapult Server are likely to be requested by clients as well.

TTL. Longer TTLs are more valuable to cache than shorter TTLs; Catapult Server will also check objects that are close to expiration.

Server load. Catapult Server performs more aggressive active caching during periods of low server load than during high load.

Active caching results in:

Better Client Performance. Clients are more likely to find their URL in the cache, resulting in lower latency and better throughput.

Even Load Distribution. Active caching has the effect of “time-shifting” requests from busy periods to off-peak periods.

More Accurate Data. By checking unexpired objects during off-peak periods, the likelihood of returning stale data to clients is reduced.

Negative Caching

Negative caching consists of creating cache objects that represent HTTP error conditions associated with accessing a particular URL (for example, “URL not found”). These responses are cached and returned for subsequent client requests for the same URL. Specific HTTP error messages that are negatively cached include:

403; Forbidden request (typically caused by file system access controls)
404; URL not found

How the RWS Service Works

Understanding Windows Sockets and Remote Windows Sockets

Remote Windows Sockets Architecture

Windows Sockets APIs

Remote Windows Sockets Limitations

Understanding Windows Sockets and Remote Windows Sockets

About Windows Sockets

About Remote Windows Sockets

About Windows Sockets

Windows Sockets is a mechanism for interprocess communication between applications running on the same computer, or on different computers connected using a local area network (LAN) or wide area network (WAN). Windows Sockets defines a set of standard APIs that an application uses to communicate with one or more other applications, usually across a network. The APIs support initiating an outbound connection (for clients), accepting an inbound connection (for servers), sending and receiving data on those connections, and terminating the connection when done.

The Windows Sockets Specification includes a standard set of APIs supported by all Windows-based TCP/IP protocol stacks, and to be used by network applications. Support for other transport protocols is included in Windows Sockets. Some Windows Sockets implementations support Internetwork Packet Exchange/Sequenced Packet Exchange (IPX/SPX) and NetBEUI.

Windows Sockets supports point-to-point connection-oriented communications (referred to as stream-oriented), and point-to-point or multipoint connectionless communications (referred to as datagram-oriented). When using the TCP/IP protocol suite:

Stream-oriented connections use the TCP protocol.
Datagram-oriented communications use the User Datagram Protocol (UDP).

Most Internet application protocols (HTTP, gopher, FTP, and so on) are connection-oriented client/server protocols. A client typically initiates a connection to a server in order to process a user request. A server waits for connections initiated by clients, accepts those connections, and begins communicating with the client following the rules of the specific application protocol.

In Windows Sockets, application communications channels are represented by data structures called sockets. A socket is identified by two items:

An address
A port

For example, a TCP/IP socket is associated with an IP address (a 32 bit number that uniquely identifies the local IP network interface), and a TCP or UDP port. The port identifies the virtual channel used for communications, at the TCP/UDP level. A stream-oriented (TCP) connection is associated with a local address/port pair, and a remote address/port pair.

A server executes the following steps to create a connection to a client:

The socket() API is used to establish a socket and associate it with a specific stream or datagram protocol (the stream-oriented TCP protocol, for example).

The bind() API is used to associate a local IP address and port with the socket. Most servers specify that they want to bind the socket to all local IP addresses, and indicate the well-known port for the application protocol (80 for HTTP, 21 for FTP, and so on).

The listen() API is used to enable inbound connections on the IP/port pair.

When a client connection attempt is received, the server uses the accept() API to complete the connection process, associate a different socket with the connection, and go back to the listening stage on the original socket to handle future client connections.

The server uses the recv() and send() APIs to communicate with the client

The server can use the getsockname() API to query the local and remote IP address/port pairs.

A client typically initiates a connection to a server in order to process a user request. The client executes the following steps:

The socket() API is used to establish a socket and associate it with a specific stream or datagram protocol (the stream-oriented TCP protocol, for example).
The bind() API is used to associate a local IP address and port with the socket. Most clients specify that they are willing to use any local IP address and port.
The connect() API is used to initiate a connection to a specified IP address/port pair. The remote IP address specified identifies the server, and the port identifies the service (80 for HTTP, 21 for FTP, and so on).
The client uses the recv() and send() APIs to communicate with the server.
The client can use the getsockname() API to query the local and remote IP address/port pairs.

Some Internet applications use the connectionless UDP protocol. UDP, a datagram protocol, does not guarantee reliability or sequencing of packets, and does not support resizing of packets, but offers higher performance than TCP. This is useful for real-time applications such as streaming audio and video. For example, RealAudio and VDOLive use UDP.

For UDP, the client and server each establish a UDP socket using the socket() API, bind that socket to a local IP address/port pair with the bind() API, and then immediately start sending and receiving data with the sendto() and recvfrom() APIs. These APIs specify the IP port to send to, and return the IP port received from. While most UDP-based protocols consist of a client communicating with a single server at a time, the connectionless protocol supports communications between a client and multiple servers, over a single socket in the client application.

About Remote Windows Sockets

Remote Windows Sockets makes a Windows Sockets application (running on a private network) perform as though it is directly connected to the Internet, when actually, there is a gateway computer connecting the two networks.

The client application calls Windows Sockets APIs to communicate with applications running on Internet computers, and the RWS components remote the necessary APIs to the gateway computer, thus establishing a communication path from the internal application to the Internet application through the gateway computer, totally transparent to the two applications. To the Internet (external) application, it appears that the application it is communicating with is running on the gateway computer.

The Catapult Server Remote Windows Sockets (RWS) service offers client and server support for most standard and custom Internet applications that communicate using Windows Sockets. The RWS service works with Windows-based TCP/IP applications on the private network, and any TCP/IP applications platform on the Internet.

The RWS service remotes Windows Sockets 1.1 applications. Almost all Windows Sockets 1.1 TCP/IP applications can be remoted. (However, remoting of Windows Sockets 2.0 APIs or applications is not supported.)

Remote Windows Sockets Architecture

Using TCP/IP on the Internal Network

Using IPX/SPX on the Internal Network

RWS Components

Remote Windows Sockets consists of a service running on a gateway computer, and a DLL installed on each client computer. On client computers, the Windows Sockets DLLs are renamed, and the RWS DLL is given the name of the corresponding Windows Sockets DLL (Winsock.dll for 16-bit;Wsock32.dll for 32-bit). This results in Windows Sockets-compatible applications linking to the RWS DLL. The RWS DLL links to the renamed Windows Sockets DLL.

The client DLL intercepts Windows Sockets API calls made by applications on the client computer. Depending on the API, and the current socket status, the client RWS DLL may completely process the client’s request, may pass the request to the (renamed) actual Winsock DLL on the local computer (after possibly making changes to the request), or may need to pass control information to the RWS service on the gateway computer.

For network communication between local applications (on the internal network), the RWS client DLL passes Windows Sockets API calls to the installed (and renamed) Windows Sockets DLL. Therefore normal Windows Sockets communications will continue to work. This is true if a third party TCP/IP stack and the Windows Sockets DLL are installed as well.

There are two versions of the RWS client DLL: a 16-bit version and a 32-bit version. The 16-bit version is installed on Windows 3.1 and Windows For Workgroups 3.11. The 32-bit version is installed on Windows NT. Both versions are installed on Windows 95.

The RWS service runs on Windows NT Server version 4.0 only. It runs as a stand-alone Windows NT service, and is responsible for creating virtual connections between internal applications and Internet applications. The RWS service is also responsible for doing “data pumping” between the two actual communications channels set up for a virtual connection, and acting as a protocol gateway if the internal network runs IPX/SPX.

RWS Control Channel

The RWS service and client DLLs communicate using a control channel that is set up when the client DLL is first loaded. The control channel uses the connectionless UDP protocol. UDP allows a single socket on the gateway computer to be used for communications with all RWS clients, and is faster than TCP. A simple acknowledgment protocol is used between RWS client and service to add reliability to the control channel.

The goal is to use the control channel as infrequently as possible, and to have as few as possible Windows Sockets APIs that require special processing on the client computer. For example, for TCP connection requests, the control channel is used to set up the virtual connection, but once the connection is set up, sending and receiving data (send() and recv() APIs) requires no special processing on the client: the RWS DLL simply forwards these requests to the (renamed) Windows Sockets DLL. This also means that the Win32 APIs ReadFile and WriteFile, which bypass Windows Sockets, will work with remoted connections.

The control channel is used for the following purposes:

Routing information (RWS server to RWS client)
When the client first establishes the control channel, the server sends to the client the Local Address Table, which contains a list of internal IP addresses and subnets, so the client will know when requests need to be remoted.
TCP connections (RWS client to RWS server)
When a connection with a remote application is being established, the control channel is used in establishing the virtual connection. Once the connection is established, sending and receiving data will not require use of the control channel.
UDP communications (RWS client to RWS server when the UDP socket is bound, and RWS server to RWS client each time a new remote peer sends data to the internal application)
In order to support multiple remote applications communicating with the internal application, port-mapping information is sent to client DLL each time a new remote peer sends data. Sending and receiving data to and from known peers does not require the control channel.
Database requests (RWS client to RWS server, and from RWS server to RWS client)
Remoting of the Windows Sockets database requests, such as DNS name resolution (gethostbyname(), and so on) is handled by passing the client request to the RWS service using the control channel, and the response is forwarded to the client DLL using the control channel.

When the first application on a client attempts to make its first Windows Sockets connection, the RWS DLL is loaded and initialized. At this time, the DLL establishes its control channel with the RWS service, and notifies the service, using the control channel, that it is active. The service downloads the Local Address Table (LAT).

(The LAT is a routing table that consists of a list of IP address pairs, each pair indicating a range of addresses located on the internal (private) network. The LAT table is configured using the Network Configuration dialog box of the Catapult Server Setup program. The LAT table configuration information is stored in the Iaslat.txt file, located by default at C:\ias\clients.)

For future connection attempts by applications, the RWS DLL attempts to determine if the application is trying to communicate with a local computer (private network) or remote computer (Internet). For connection attempts and Windows Sockets APIs destined for a local computer, the RWS DLL simply forwards the API calls to the (renamed) Windows Sockets DLL, for normal processing. If a Windows Sockets API call contains no information about the destination (and therefore no indication as to whether it should be remoted), the RWS component assumes it is a local request, and forwards the request to the standard Windows Sockets DLL.

When a Windows Sockets database API is called by an application (gethostbyname(), and so on) to resolve an Internet name or address, the RWS components work together, using the control channel, to remote the request to the gateway computer, and have the request processed on the Internet.

The architecture of RWS requires special processing by the client’s RWS DLL when establishing a connection with an Internet site, but once a communication channel is established, standard Windows Sockets and Win32 APIs for reading and writing a socket or file can be used with no special processing on the client. The application performs as if it is reading and writing the Internet site, while it is actually communicating with the RWS service, which is “proxying$#148; the requests.

The control channel uses UDP port number 9321 on the RWS server and client computers.

The following illustration depicts the Remote Windows Sockets components on an IPX/SPX private network.

TCP Remoting

TCP handles point-to-point, connection-oriented communications. For each TCP connection requested by an internal application, two actual connections are set up by RWS. One connection is between the client application and the RWS service (using the RWS proxy server’s internal network interface), and the other is between the RWS service, and the Internet application (using the RWS proxy server’s Internet interface). Data received from either connection is forwarded to the other connection, and it both applications perform as though they are communication directly with each other.

The RWS control channel is used only in setting up the TCP connection. Once the connection is set up, the control channel is not used, and send() and recv() APIs are simply forwarded by the RWS client DLL to the real Windows Sockets DLL (these APIs do not contain addresses, they simply refer to a socket). The data sent between RWS client and server is identical to that sent in a normal (non-remoted) connection. The ReadFile() and WriteFile() Win32 APIs work on the TCP socket connection even though, on Windows NT, these APIs are not handled by Windows Sockets (and therefore are not intercepted by the RWS DLL).

In order to initiate an outbound TCP connection to an Internet site, an internal application binds a socket to a local IP address and port on its own computer. (Or the application can specify IP_ANY or PORT_ANY.) The application then uses the socket to connect to a specific remote IP address and port on the Internet computer.

The RWS components create a virtual connection as follows:

A socket connection is created between the internal application (on the local port specified by the application or assigned in the socket bind), and the RWS service (on the proxy computer’s internal IP address and an arbitrary port).
A socket connection is created between the RWS service (on its Internet IP and the same port used as the local port in the internal application’s bind), and the Internet application (on the remote IP address specified by the internal application and the remote port specified by the internal application).

Once an internal application’s socket has been remotely bound, RWS makes it appear that the socket is bound to the proxy computer’s Internet interface. If the internal application calls the getsockname() API, the data returned will indicate that the socket’s local IP address is that of the proxy computer. Thus, it appears to the application, that it is on the Internet. This is necessary for protocols such as FTP, in which the client sends its local IP address to a server, in order for the server to initiate a new TCP connection back to the client.

When an internal application attempts to listen for a TCP connection initiated by an Internet application, RWS uses the local IP address to which the application’s socket is bound to determine whether the listen should be remoted. If the local IP address is that of the Internal computer’s interface (a private network IP address), the listen will be local (passed to the Winsock DLL). If the IP address bound to the socket is that of the proxy server’s Internet interface, the listen will be remoted.

When a listen() API is remoted, RWS does the following:

Listens for a socket connection on the RWS service’s Internet IP address and the same port specified as the local port in the internal application’s socket bind.
When an external site connects to the port, creates a socket connection between the internal application (on the local port specified by the application), and the RWS service (on its internal IP address and an arbitrary port). This connection is initiated by the RWS service, because the internal application is listening for an incoming connection.

Once an internal application’s socket is bound to the proxy server’s Internet IP address and an inbound connection is established, a getsockname() API call by the application will return the proxy’s Internet IP address as the local IP address, and the Internet site’s IP address and port as the remote IP and port.

The following illustration shows Remote Windows Sockets remoting a TCP connection.

UDP Remoting

UDP offers connectionless communications, and supports multiple applications communicating with an application over the same UDP socket. An application uses sendto() to send data, specifying the destination IP address, and recvfrom() to receive data, returning the source IP address.

When an internal application binds a UDP socket, the RWS service binds a UDP socket to its Internet IP address, and the same local port as used by the client. This is the socket used for communications between all Internet peers for the internal application.

When an internal client computer receives a packet over UDP from the Internet, the packet was actually forwarded by the RWS proxy server, and the source address will be that of the proxy server. The RWS client DLL needs to change the source port and IP address to that of the actual Internet source before the internal application receives the data. However, the problem is that for UDP there can be multiple sources of data sent to one destination socket.

In other implementations, this problem is sometimes handled by having the proxy service add a header to the data (which contains the original source port and IP address) before forwarding it to the internal client. The client DLL would then strip off this header and modify the source IP address and port passed to the application. This solution requires much work on every data packet, including a buffer copy, and may even result in the buffer size being larger than the maximum allowed. In this case, splitting the data into multiple packets needs to be supported, as well as ordering and recombining at the destination. Also, this solution prevents Win32 APIs from working. (On Windows NT, Win32 APIs are not passed to Windows Sockets.)

Instead, the problem of multiple-source IP addresses is solved by creating a separate UDP socket in the proxy server for each Internet peer sending data to the client. Each time the first data packet is received from a new Internet port and IP address, the RWS server creates a new UDP socket on a different local port, in the proxy server (bound to the proxy’s internal IP interface). The RWS service maintains a table that maps Internet ports and IP addresses to the port number of the RWS server’s socket for that Internet site. Each time it changes, the mapping table is forwarded to the RWS client DLL using the control channel.

When the RWS service receives data from an Internet application destined for the client, it sends the data to the client using the associated socket on the proxy server. The RWS client DLL looks at the source (remote) port number of the data packet (proxy-server port number), and uses the table to map that to an Internet application’s port and IP address. The internal application is handed the Internet port and IP address as the source. The result is that handling UDP communications does not require extra control channels, does not cause data packets to be modified, and does not require use of the control channel when data is sent from an Internet peer that the RWS service already knows about. Win32 APIs also work for reading and writing the socket.

When the internal application sends data to one of the remote peers, the RWS DLL uses the mapping table to map the destination port and IP address (specified by the internal application) to an RWS server port, and sends the data to the appropriate UDP socket (port) on the RWS server computer.

Because this mechanism requires a new socket for each Internet peer application, extra resources are used in the proxy server when an internal application uses UDP to communicate with many remote peers. Most Internet client applications that use UDP (RealAudio, VDOLive, and so on), communicate with a single server application, so this is an efficient trade-off. For other UDP client applications, the number of servers communicating with the client is usually small.

When an internal application calls getsockname() for a remoted UDP socket, the local IP address returned is that of the proxy’s Internet interface.

The following illustration depicts remoted UDP communications that use Remote Windows Sockets.

Using TCP/IP on the Internal Network

When the internal network runs TCP/IP, a TCP/IP application could try to communicate with a local (internal network) or remote (Internet) application. When the RWS client DLL initializes, it receives from the RWS service, using the control channel, a Local Address Table, which contains a list of IP addresses and subnets that are located on the private network. Future communication attempts by applications on the client computer with a specific IP address can be routed locally or remotely by the RWS DLL, as appropriate. If communication is attempted with a local IP address, the RWS DLL simply forwards the request to the real Window Sockets DLL, with no special processing.

In some cases an application attempts to communicate, but the RWS DLL cannot determine whether the application is trying to communicate with a local computer or a remote computer. For example, a typical server application binds a socket to all local IP addresses (IP_ANY), and then listens on that socket. If RWS cannot in any way determine if a listen() API should be local or remote, it will assume local (the more secure of the two).

If multiple internal servers are to be set up to do remoted listen() APIs on the same port at the same time, they must use different IP addresses on the proxy server’s Internet interface. For example, if two Exchange (IMC) servers internally will be listening on the SMTP port (port 25), the proxy server must have two IP addresses assigned to its Internet: one for each Exchange (IMC) server. Because both internal servers listen() on the same port, and the two listen()s must be remoted to the same port on the proxy computer, the only way to distinguish them on the proxy is by using different IP addresses.

Using IPX/SPX on the Internal Network

If the internal network does not run TCP/IP, it is assumed that all attempts by Windows Sockets applications to communicate over TCP/IP are to be remoted to the Internet. No routing information is transferred to the RWS DLL at initialization time.

When the internal network runs IPX/SPX, the principle of how communications are remoted is identical to that used when the internal network runs TCP/IP. When an internal application attempts to establish communications over TCP/IP, the RWS DLL changes the Windows Sockets API parameters to those appropriate for IPX/SPX (address reformatting, and so on), and the communications between client and RWS server are handled over IPX/SPX. The RWS server acts as a protocol gateway, converting between IPX/SPX on the private network and TCP/IP on the Internet. In addition to standard remoting functionality, the following tasks are accomplished by the RWS DLL when establishing remote communications with IPX/SPX:

socket() API. When an application specifies a protocol of TCP or UDP in a socket API call, the RWS DLL changes this to the appropriate IPX/SPX protocol.
bind() API. When an application specifies a local IP address to bind a socket to, RWS converts this to a local IPX/SPX address. A request to bind to IP_ANY is also converted.
connect() API. When an application attempts to connect to an Internet application, the address passed to the Windows Sockets DLL needs to be the IPX address of the proxy server’s internal interface.
sendto() API. The destination IP address needs to be converted to IPX address of the proxy server’s internal interface.
recvfrom() API. The source IP address returned needs to be converted from IPX address of the proxy server to IP address of the Internet application.
The control channel uses IPX instead of UDP.

Windows Sockets APIs

socket()

bind()

connect()

listen() and accept()

recv() and send()

recvfrom() and sendto()

socket()

The socket() API is used by applications to establish a socket and associate it with a protocol (TCP, UDP, and so on) The socket() API requires no special processing by the RWS DLL if the internal network runs TCP/IP; the API is simply passed to the standard Windows Sockets DLL for the local creation of the socket. If the local network runs IPX/SPX, the RWS DLL needs to change the protocol specified in the socket() API call (UDP or TCP), to the appropriate IPX/SPX protocol.

bind()

After calling the socket() API, clients may call the bind() API to bind the socket to a specific local interface (IP address) and port. This API is intercepted by the RWS DLL and forwarded to the RWS service using the control channel.

The RWS service, in preparation of an attempt by the application to create a remote connection using this socket, creates one socket on the gateway computer for a UDP socket, and two sockets on the gateway computer for a TCP socket. This is done by calling the socket() and bind() APIs for the one or two new sockets. One new socket will be bound to the same port number that the client specified in its bind(), and the IP address of the gateway computer’s Internet interface. For TCP, the second socket is bound to the IP address of the gateway computer’s internal interface (and an arbitrary port). The client’s socket bind() request will then be passed to the Winsock DLL on the client computer, for normal processing.

connect()

An application uses the connect() API to initiate an outbound TCP connection to a remote IP address and port pair. If the RWS DLL determines, by looking up the IP address in the downloaded Local Address Table, that the application is attempting to connect to a remote (Internet) site (or if the local network runs IPX/SPX), the DLL forwards the request to the RWS service using the control channel.

The RWS service performs these actions:

A connect() API (on the socket that was previously bound to the Internet interface. See the preceding section on bind() to establish a connection with the remote site at the IP address and port specified in the client’s connect().
For the other socket, which was bound to the interface on the internal network, the listen() and accept() APIs are used to establish the connection with the client.

The RWS DLL passes the connect() API to the Windows Sockets DLL, but first changes the IP address of the remote computer to that of the gateway computer’s internal interface (or converts it to the gateway computer’s IPX address, if the local network runs IPX/SPX). The listen() and accept() APIs used by the RWS service will complete the establishment of this connection.

The result is that the RWS service on the gateway computer has two socket endpoints that represent communications channels with the two communicating applications.

listen() and accept()

When a client allows an inbound connection from a remote computer, it calls the listen() API. If the RWS DLL determines from the Local Address Table, or configuration information, that the client is attempting to establish a connection with an Internet computer, the DLL forwards the listen() API to the RWS service using the control channel.

The RWS service will do a listen() on the socket bound to its Internet interface when the client did its bind(). When the remote application attempts to connect to the RWS service’s socket, the service will do an accept() to complete the connection process. The service will then do a connect() on the internal socket to establish a connection with the internal client application. The client application will then call the accept() API to complete the connection process.

The result is that the RWS service on the gateway computer has two socket endpoints that represent the two communicating applications.

recv() and send()

Once a TCP socket connection is established between internal application and the RWS service, and a corresponding connection is established between RWS service and remote application, the client can receive and send data with the recv() and send() APIs.

The RWS service uses the recv() API on both connections for receiving data packets. When a client sends data, the data is actually sent to the RWS service, because it has the socket endpoint of the client’s connection. The RWS service simply receives the data on that connection, and sends it to the remote computer using the other connection.

When the remote application sends data to the client, the RWS service receives this data and sends it to the client application.

Receiving and sending data on the client computer requires no special handling on the part of the RWS DLL. The DLL simply passes these API calls on to the (renamed) Windows Sockets DLL. The RWS service does all of the special handling, by passing the data to the associated connection. This results in high performance when sending and receiving data on the client. The Win32 APIs for reading and writing files, when applied to a remoted socket, will work successfully.

recvfrom() and sendto()

The recvfrom() and sendto() APIs are most often used with UDP connectionless communications. The sendto() API requires an IP address and port destination, and the recvfrom() API returns an IP address and port of the originator of the data.

When the internal application does a bind() on a UDP socket, the RWS service binds a socket on the gateway computer’s external (Internet) interface, to send and receive data to and from remote applications. Once this socket is bound, remote servers can send data, destined for the internal application. Each time a UDP data packet is received from a new IP address and port pair, the RWS service creates a new UDP socket on the gateway computer and binds it to a different port on the gateway’s internal interface.

The RWS service maintains a mapping table of remote IP address and port pairs (that have sent data) with the port number of the corresponding internal-interface socket in the gateway computer. This mapping table is downloaded to the RWS DLL on the client computer using the control channel.

When a UDP data packet is received from a remote computer, the RWS service looks up the internal socket (based on the remote computer’s IP address and port) to use for that remote computer, and sends the data to the internal computer using the corresponding socket. The RWS DLL on the client computer will receive the data packet from the gateway computer’s IP address, and the port that it was sent from can be used to look up the IP address, and port, of the originating remote computer. The RWS DLL will replace the source information, making it appear that the data came directly from the remote computer.

When the client (internal) application sends data to a remote computer, the RWS DLL intercepts the request and modifies the destination to send it to the RWS service using one of the RWS service’s sockets bound to the internal IP interface. In order to determine which RWS service socket to use, the RWS DLL looks up the final destination IP address and port pair in the UDP mapping table, and that indicates which RWS service socket (port) to send to.

Remote Windows Sockets Limitations

Version 1.0 of Catapult Server remotes Windows Sockets 1.1 applications. Remoting of Windows Sockets 2.0 APIs or applications is not supported. Almost all Windows Sockets 1.1 TCP/IP applications can be remoted. This section describes limitations that may prevent specific protocols or applications from working through the Catapult Server Remote Windows Sockets service.

When an internal application receives an inbound TCP connection from an Internet site, the internal application’s listen() API needs to be remoted to the RWS service computer’s Internet interface. If multiple internal computers will be running that same application, and therefore listening on the same port at the same time, each one needs to use a different Internet IP address on the RWS server in order to distinguish them (because the port is the same).

There are a small number of APIs that cannot be handled properly in a remoted environment. Following is a list of APIs that will not be remoted properly:

duplicatehandle()
getsockopt()

How the Proxy and RWS Services Work Together

If the RWS and Proxy services are used together, they can be configured so that the features of each service complement the other. The server and each clients are configured as follows:

The client computer is configured as an RWS client.
The client’s Internet browser is configured to use the Catapult Server Proxy service.
If the private network is running TCP/IP, the Local Address Table is configured to specify that the Proxy server’s internal IP address is not on the local network. This forces the use of RWS between the client and Catapult Server. The Local Address Table must be modified on all computers on the private network running Catapult Server. However, if the private network is running only IPX/SPX, this step is unnecessary.

For HTTP, FTP, and gopher requests from the browser, the RWS and Proxy services work together as follows:

The browser sends a proxy request to the IP address of the Proxy server.
The RWS client DLL looks up the IP address in the Local Address Table and determines that the request needs to be remoted. Or, if the local network is running IPX/SPX, the RWS client DLL automatically remotes all TCP connects.
The RWS client DLL works with the RWS service to set up a socket connection between the client and the RWS service, and another socket connection from the RWS service to the Internet Information Server running the Proxy ISAPI components. (The RWS service and the Proxy ISAPI components may be on the same server or on separate servers.)
Client requests are sent from the client to RWS. (RWS provides Windows NT Challenge/Response authentication and support for both IPX/SPX and TCP/IP on the internal network.)
The RWS service forwards the request to the Proxy server. (The Proxy ISAPI application provides caching.)
The Proxy ISAPI application issues Internet requests for resources not found in the cache.

The following illustration depicts the Proxy and Remote Windows Sockets services used together, with IPX/SPX running on the internal network.

Authentication and Security

Integration with Windows NT

Anonymous Connections

Client Requests Containing Credentials

Internet Service Manager Authentication Options

Other Authentication Issues

Catapult Server takes advantage of the authentication and security architecture of Microsoft Internet Information Server (IIS). This section discusses the IIS architecture with respect to these functions.

Integration with Windows NT

The WWW, gopher, and FTP services included with Microsoft Internet Information Server are fully integrated with Windows NT Server user accounts and file access permissions.

These services provide access to a resource (file, HTML page, ISAPI application, and so on) on behalf of a Windows NT user. The service “impersonates” the user by supplying a user name/password pair in the attempt to read or execute the resource for the user.

The NTFS file system allows Access Control Lists (ACLs) to be assigned to files and directories. ACLs grant or deny access to the associated file or directory by specific Windows NT user accounts, or groups of users. When an Internet service attempts to read or execute a file on behalf of a client request, the user account offered by the service must have permission, as determined by the ACL associated with the file, to read or execute the file. If the user account does not have permission to access the file the request fails, and a response is returned informing the client that access has been denied.

File and directory ACLs can be configured using Windows NT Explorer.

Anonymous Connections

IIS processes an anonymous connection when a client request does not contain a user name and password. This occurs under the following conditions:

An FTP client logs on with the user name “anonymous”
All gopher requests
A Web (HTTP) request, when the header does not contain a user name and password

Each Internet service maintains a Windows NT user name and password to be used for the processing of anonymous requests. When an anonymous request is received, the service “impersonates” the user configured as the “anonymous logon” user. The request will succeed if the “anonymous logon” user has permission to access the requested resource, as determined by the resource’s ACL. For the WWW service only, if the user does not have permission to access the resource, the response returned to the client contains a list of supported authentication schemes for gaining access to the resource.

The “anonymous logon” user account can be viewed and modified on the Service property sheet of the Internet Service Manager. Multiple Internet Information Server services running on the same computer can use the same, or different “anonymous logon” user accounts. Including the “anonymous logon” user account in file or directory ACLs allows for precise control of the resources available to “anonymous” clients.

The “anonymous logon” user account specified must be a valid Windows NT user account on the server computer, and the password specified must match the password for this user in the computer’s user database. User accounts and passwords are configured using the Windows NT User Manager.

When the Internet Information Server product is installed, the Setup program creates a user account on the server computer to be used for anonymous connections. The user name of this account has the form “IUSR_computer name.” For example, if the server’s computer name is WEB1, the user name created will be “IUSR_WEB1.” The same “anonymous logon” user account is set up for all Internet Information Server services installed on the computer. The account is made a member of the computer’s Guest group. This will, in most cases, give anonymous client requests access to public content published on the server. Run the Network application in Control Panel to see the computer name.

A randomly generated password is created for the “IUSR_computername” account. For maximum convenience and security, we suggest that you change the password associated with this account to a password that you will remember, but is not easily guessed. To do this, you must specify the new password for the account in User Manager, and on the Service property sheet of Internet Service Manager for each IIS service installed.

When the Internet Information Server is installed on a primary or secondary domain controller, the “anonymous logon” user account is created in the user account database of the domain. When IIS is installed on a domain member-server, or a stand-alone server, the account is created on the local computer.

If IIS is installed on multiple domain controllers of the same domain, a separate user account is created in the domain user database for each Internet server computer. This does not cause any conflicts because each user name is unique and contains the name of the associated computer. However, you may find it more convenient to create a single “anonymous logon” user account in the domain to use for all IIS domain controllers in the domain. This can simplify administration of ACLs. To do this, follow these steps:

In User Manager, create a new “anonymous logon” user account in the domain. Be sure that this account is made a member of appropriate groups, given a secure password, and is given the User Right (in the Policies menu) to “Log on Locally”.
On the Service property sheet of Internet Service Manager, specify the new “anonymous logon” user name and password. You must do this for each IIS service running on all primary and secondary domain controllers in the domain.
When later installing IIS on other domain controllers in the domain, be sure to use Internet Service Manager to modify the “anonymous logon” user name and password to match those created with User Manager. Do this for each IIS service installed.

Client Requests Containing Credentials

A request containing credentials is one of the following:

An FTP client logs on with a valid Windows NT user name and password. This requires that the FTP service’s “Allow only anonymous connections” checkbox be cleared.
Warning FTP sends passwords across the network in clear text.
A WWW (HTTP) request’s headers contain a user name and password. This is HTTP Basic authentication.
Warning HTTP Basic authentication sends passwords across the network in clear text.
A WWW browser supports Windows NT Challenge/Response authentication, and an anonymous client request is denied access to a resource. In this case, the browser automatically sends the Windows client’s user name and password to IIS using the encrypted Windows NT Challenge/Response protocol. In this release, only the Internet Explorer 2.0 for Windows 95 supports Windows NT Challenge/Response authentication.

When an Internet Information Server service receives a client request that contains credentials (a user name and password), the “anonymous logon” user account is not used in processing the request. Instead, the user name and password received by the client are used by the service. If the service is not granted permission to access the requested resource while “impersonating” the specified user, the request fails, and an error notification is returned to the client.

For the WWW service (HTTP) only, when an anonymous request fails because the “anonymous logon” user account does not have permission to access the desired resource, the response to the client indicates which authentication schemes the service supports. This is determined by the configuration of the WWW service’s authentication features. If the response indicates to the client that the service is configured to support HTTP Basic authentication, most Web browsers will display a user name/password dialog box, and reissue the anonymous request as a request with credentials, including the user name and password entered by the user.

If a Web browser supports Windows NT Challenge/Response authentication, and the WWW service is configured to support Windows NT Challenge/Response authentication, an anonymous WWW request failing due to permissions, will result in automatic use of the Windows NT Challenge/Response protocol to send a user name and encrypted password from the client to the service. The client request will then be reprocessed, using the client’s user information. The user account obtained from the client is that with which the user is logged into the client computer. Because this account, including its Windows NT domain, must be a valid account on the Web server computer, Windows NT Challenge/Response authentication is most useful in a private network environment, where the client and server computers are in the same, or trusted domains. In this release, Internet Explorer for Windows 95 is the only browser that supports Windows NT Challenge/Response authentication.

Internet Service Manager Authentication Options

In addition to the “anonymous logon” user name and password fields, theService property sheet of Internet Service Manager contains the following authentication options:

WWW

Allow Anonymous: When this check box is selected, anonymous connections are processed, and the “anonymous logon” user name and password are used for these connections. When this check box is cleared, all anonymous connections are rejected. In this case, basic or Windows NT Challenge/Response authentication can be used to access content.
Basic: When this check box is selected, the WWW service will process requests using Basic authentication.
Warning Basic authentication sends Windows NT user names and passwords across the network without encryption. This checkbox is cleared by default for security reasons.
Windows NT Challenge/Response: When this check box is selected, the service will honor requests by clients to send user account information using the Windows NT Challenge/Response protocol. This protocol uses encryption for secure transmission of passwords. The Windows NT Challenge/Response authentication process is initiated automatically as a result of an “access denied” error on an anonymous client request. Currently, Windows NT Challenge/Response authentication only works with Internet Explorer 2.0 for Windows 95.

Note If the Basic and Windows NT Challenge/Response check boxes are both cleared (and the Allow Anonymous check box is selected), all client requests are processed as anonymous requests. In this case, if the client supplies a user name and password in the request, this user name and password are ignored by the WWW service. The “anonymous logon” user account will be used to process the request.

FTP

Allow Anonymous Connections: When this check box is selected, FTP logons in which the user enters a user name of “anonymous” will be processed. These anonymous connections will be processed on behalf of the Windows NT user account specified on the Service property sheet. When this check box is cleared, users will be required to enter valid Windows NT user names and passwords to log onto the FTP service. Allow only anonymous connections: When this check box is selected, user logons with a user name other than “anonymous” will be rejected.

Warning FTP User names and passwords are sent across the network in clear text. When this check box is cleared, Windows NT passwords will be sent to the server without encryption. This check box is selected by default for security reasons.

Other Authentication Issues

SSL

“INTERACTIVE” and “NETWORK” Users

Log on Locally

Customized Authentication

SSL

SSL is a WWW feature that supports data encryption and server authentication. All data sent to and from the client using SSL is encrypted. If HTTP Basic authentication is used in conjunction with SSL, the user name and password are transmitted after being encrypted by the client’s SSL support. In this release, the only Web browser that supports SSL is Internet Explorer 2.0, for Windows 95.

INTERACTIVE and NETWORK Accounts

If for access control you use the predefined Windows NT user accounts named INTERACTIVE and NETWORK, your use of these accounts may affect client access to some resources. In order for a file to be accessed by anonymous client requests or client requests using Basic authentication, the requested file must be accessible by the INTERACTIVE account. In order for a file to be accessible by a client request using Windows NT Challenge/Response authentication, the file must be accessible by the NETWORK account.

Log on Locally

In User Manager, when configuring a Windows NT user account to be used either as the Internet Information Server “anonymous logon” account, or as a user account specified by client requests using HTTP Basic authentication, be sure that the user account is granted the “Log on locally” user right. This is specified in the Policies menu of User Manager for Domains.

Customized Authentication

If you need a WWW request authentication scheme not supported by the service directly, obtain a copy of the Internet Server API (ISAPI) Software Development Kit (SDK), and read the ISAPI filters specification on how to develop user-written ISAPI filter DLLs that handle request authentication.